Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

Identifieur interne : 000822 ( Main/Exploration ); précédent : 000821; suivant : 000823

Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

Auteurs : Yan Gao [États-Unis] ; Ming Yang [États-Unis] ; Alok Choudhary [États-Unis]

Source :

RBID : ISTEX:76B575CDBF69BDF5256683067175D726D5D4889A

Abstract

Abstract: Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the semi-supervised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.

Url:
DOI: 10.1007/978-3-642-03348-3_17


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach</title>
<author>
<name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
</author>
<author>
<name sortKey="Yang, Ming" sort="Yang, Ming" uniqKey="Yang M" first="Ming" last="Yang">Ming Yang</name>
</author>
<author>
<name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:76B575CDBF69BDF5256683067175D726D5D4889A</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-03348-3_17</idno>
<idno type="url">https://api.istex.fr/document/76B575CDBF69BDF5256683067175D726D5D4889A/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001517</idno>
<idno type="wicri:Area/Istex/Curation">001430</idno>
<idno type="wicri:Area/Istex/Checkpoint">000344</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Gao Y:semi:supervised:image</idno>
<idno type="wicri:Area/Main/Merge">000830</idno>
<idno type="wicri:Area/Main/Curation">000822</idno>
<idno type="wicri:Area/Main/Exploration">000822</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach</title>
<author>
<name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of EECS, Northwestern University, Evanston, IL</wicri:regionArea>
<placeName>
<region type="state">Illinois</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Yang, Ming" sort="Yang, Ming" uniqKey="Yang M" first="Ming" last="Yang">Ming Yang</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>NEC Laboratories America, Cupertino, CA</wicri:regionArea>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: myang@sv.nec-labs.com</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of EECS, Northwestern University, Evanston, IL</wicri:regionArea>
<placeName>
<region type="state">Illinois</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">76B575CDBF69BDF5256683067175D726D5D4889A</idno>
<idno type="DOI">10.1007/978-3-642-03348-3_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the semi-supervised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Californie</li>
<li>Illinois</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Illinois">
<name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
</region>
<name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
<name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
<name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
<name sortKey="Yang, Ming" sort="Yang, Ming" uniqKey="Yang M" first="Ming" last="Yang">Ming Yang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000822 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000822 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:76B575CDBF69BDF5256683067175D726D5D4889A
   |texte=   Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024